Warning: file_put_contents(aCache/aDaily/post/opendatascience/-2330-2331-): Failed to open stream: No space left on device in /var/www/tg-me/post.php on line 50
Data Science by ODS.ai 🦜 | Telegram Webview: opendatascience/2331 -
Telegram Group & Telegram Channel
βš™οΈ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub, designed for training and evaluation of LLMs in software engineering.

Main features of the system:
1️⃣ Automatic data collection: Continuously extracts issue-PR pairs from Python repositories.
2️⃣ LLM-based environment setup: LLM analyzes repositories, creates install instructions, and updates them if errors happen.
3️⃣ Execution-based validation: Each task is tested by automatic setup, test run, and dependency freezing to make it reproducible.
4️⃣ LLM quality annotation: Tasks are labeled for clarity, difficulty, and test correctness to support filtering.

Result:
SWE-rebench dataset: 21,000+ ready-to-use interactive tasks.
Continuous updates: Fresh data is added regularly.
Transparent evaluation: Tasks are used for public SWE-rebench leaderboard.

πŸš€ SWE-rebench gives researchers and developers real and validated tasks to work with LLMs in SWE field.

Technical report: arXiv
Dataset: SWE-rebench



tg-me.com/opendatascience/2331
Create:
Last Update:

βš™οΈ SWE-rebench: Nebius AI R&D team presents new dataset for SWE tasks.

Researchers built an automated system to collect and validate thousands of real-world tasks from GitHub, designed for training and evaluation of LLMs in software engineering.

Main features of the system:
1️⃣ Automatic data collection: Continuously extracts issue-PR pairs from Python repositories.
2️⃣ LLM-based environment setup: LLM analyzes repositories, creates install instructions, and updates them if errors happen.
3️⃣ Execution-based validation: Each task is tested by automatic setup, test run, and dependency freezing to make it reproducible.
4️⃣ LLM quality annotation: Tasks are labeled for clarity, difficulty, and test correctness to support filtering.

Result:
SWE-rebench dataset: 21,000+ ready-to-use interactive tasks.
Continuous updates: Fresh data is added regularly.
Transparent evaluation: Tasks are used for public SWE-rebench leaderboard.

πŸš€ SWE-rebench gives researchers and developers real and validated tasks to work with LLMs in SWE field.

Technical report: arXiv
Dataset: SWE-rebench

BY Data Science by ODS.ai 🦜





Share with your friend now:
tg-me.com/opendatascience/2331

View MORE
Open in Telegram


Data Science by ODS ai 🦜 Telegram | DID YOU KNOW?

Date: |

Tata Power whose core business is to generate, transmit and distribute electricity has made no money to investors in the last one decade. That is a big blunder considering it is one of the largest power generation companies in the country. One of the reasons is the company's huge debt levels which stood at β‚Ή43,559 crore at the end of March 2021 compared to the company’s market capitalisation of β‚Ή44,447 crore.

Look for Channels Online

You guessed it – the internet is your friend. A good place to start looking for Telegram channels is Reddit. This is one of the biggest sites on the internet, with millions of communities, including those from Telegram.Then, you can search one of the many dedicated websites for Telegram channel searching. One of them is telegram-group.com. This website has many categories and a really simple user interface. Another great site is telegram channels.me. It has even more channels than the previous one, and an even better user experience.These are just some of the many available websites. You can look them up online if you’re not satisfied with these two. All of these sites list only public channels. If you want to join a private channel, you’ll have to ask one of its members to invite you.

Data Science by ODS ai 🦜 from de


Telegram Data Science by ODS.ai 🦜
FROM USA